Defining a Qlik Replicate task

In order to work with Compose, you first need to define a Qlik Replicate task that replicates the source tables from the source endpoint to a landing zone in the storage (defined as the target endpoint in the Replicate task). The landing zone should then be defined as the data source for the Compose project.

For information on which endpoints can be used in a Replicate task that lands data for Compose, see Supported hive distributions for Data Lake projects.

Configuring multiple Replicate tasks with the same landing zone is not supported.

The steps below highlight the settings that are required when using Qlik Replicate with Compose. For a full description of setting up tasks in Qlik Replicate, please refer to the Qlik Replicate Help.

Prerequisites

When defining the Replicate task, make sure the following prerequisites have been met.

If the Landing Zone database supports append, it is recommended to select Sequence as the file format in the Replicate target endpoint settings and to set the Control Tables format (if available) to Text. This will improve performance by allowing Replicate to append to the file instead of creating a new file for every Change Data Partition.

If the above is not possible, then it is recommended to periodically delete files that are no longer required from the target directory. This will prevent files from amassing and degrading performance. This can be done automatically using Replicate's partition retention feature. For more information, see the Qlik Replicate Help.

When Microsoft Azure HDInsight is defined as the Replicate target endpoint, you must set the endpoint's Target storage format to Sequence.
When Oracle is defined as the source endpoint in the Replicate task, full supplemental logging should be defined for all source table columns that exist on the target and any source columns referenced in expressions.

When using live views, to ensure transactional consistency, it is recommended to turn off Speed partition mode in the Replicate task settings. When set to off, Replicate will close the partition only at the end of each transaction. This might require you to shorten the partition interval in order for the changes to be propagated to Compose in a timely manner. Shortening the partition interval might also require you to increase the partition cleanup frequency to prevent too many files from accumulating on the target and degrading performance.

For information about turning off Speed partition mode, setting partitioning intervals, and partition cleanup, see the Replicate Help.

Limitations and Considerations

Replicate allows you to define global transformations that are applied to source/Change tables during task runtime. The following global transformations, however, should not be defined (as they are not compatible with Compose tasks):
- Rename Change Table
- Rename Change Table schema
The Create target control tables in schema option in the Replicate task settings' Control Table tab is not supported.
As Compose does not use the before-image for UPDATE operations, it is recommended to set On UPDATE in the Store Changes Settings tab of the Replicate task settings to Store after image only. Note that this should only be done if the Replicate task is dedicated for use with Compose.
As Compose requires a full after-image to be able to perform Change Processing, the following Replicate source endpoints are not directly supported (as they do not provide a full after-image):
- SAP HANA (log based)
- Salesforce
Compose does not support the JSON and XML data types. Therefore, columns that are usually created with these data types (by the Replicate target endpoint) should be created as STRINGs instead. This can be done automatically within Replicate using a data type transformation. For information on which target endpoints support JSON and XML data types as well as instructions on how to create a data type transformation, refer to the Replicate Help.
If you use Replicate November 2022 to land data in Databricks, only the Databricks (Cloud Storage) target endpoint can be used. If you are using an earlier supported version of Replicate, you can continue using the existing Databricks target endpoints.

Setting up the task

To define the task:

Open Qlik Replicate and in the New Task dialog, do one of the following:
- To enable Full Load and CDC replication, enable the Full Load and Store Changes options only (the Apply Changes option should not be enabled).
- To enable Full Load only replication (without CDC), enable the Full Load replication option only.
Open the Manage Endpoint Connections window and define a source and target endpoint. The target endpoint must be the Hive database where you want Compose to create the Storage Zone tables. For more information on supported endpoints, see Supported hive distributions for Data Lake projects.
Add the endpoints to the Replicate task and then select which source tables to replicate.
This step is not relevant for Full Load only tasks. To facilitate Schema evolution in Compose, select the DDL History Control Table in the Task Settings’ Metadata|Control Tables tab. If you intend to scan all data sources (when performing schema evolution), then you must do this for ALL Replicate tasks that move data to the Landing Zone, even those with source endpoints that do not support schema evolution (e.g. Salesforce).

Information note
If you want the DDL History Control Table to be updated with any new source tables that are added during the Replicate task, you must define Table Selection Patterns in Replicate's Select Tables window.
This step is not relevant for Full Load only tasks. In the Task Settings' Store Change Setting tab, make sure that Store Changes in is set to Change tables.
This step is not relevant for Full Load only tasks. In the Task Settings’ Change Processing|Store Changes Settings tab, enable Change Data Partitioning.
This step is not relevant for Full Load only tasks. In the Task Settings’ Metadata|Control Tables tab, select the Change Data Partitioning Control Table.
This step is not relevant for Full Load only tasks. If a Primary Key in a source table can be updated, it is recommended to turn on the DELETE and INSERT when updating a primary key column option in Replicate's task settings' Change Processing Tuning tab. When this option is turned on, history of the old record will not be preserved in the new record. Note that this option is supported from Replicate November 2022 only.
Run the task.

Wait for the Full Load replication to complete and then continue the workflow in Compose as described in Adding and managing data warehouse projects .

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here